Understanding Document Aboutness Step Two: Identifying Interesting Things

نویسندگان

  • Michael Gamon
  • Arjun Mukherjee
  • Patrick Pantel
چکیده

We define the notion of an interesting nugget in a document. Such nuggets attract a user's attention and lead them to explore more information around that nugget. In order to measure and model interestingness, we look at browsing sessions within Wikipedia and we build a data set of transitions (clickthrough) from a source Wikipedia page to a destination Wikipedia page through anchor clicks. We investigate factors that influence the probability of a click on an anchor in a Wikipedia page. We propose a topic modeling approach which jointly models the contents of the source and destination pages. We then use the estimated posterior on latent variables as features, along with page structure and user metadata features to build a model of interestingness. Finally, we evaluate this model using different feature sets and we demonstrate the model's effectiveness at predicting interesting nuggets. Experimental results show that the latent semantic features are effective in predicting interestingness and can outperform baseline features.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Understanding Document Aboutness - Step One: Identifying Salient Entities

We propose a system that determines the salience of entities within web documents. Many recent advances in commercial search engines leverage the identification of entities in web pages. However, for many pages, only a small subset of entities are important, or central, to the document, which can lead to degraded relevance for entity triggered experiences. We address this problem by devising a ...

متن کامل

Preliminary Analyses of Information Features Provided by Users for Identifying Music

This paper presents preliminary findings based on the analyses of user-provided information features found in 566 queries seeking help in the identification of particular music works or artists. Queries were drawn from the answers.google.com (Google Answers) website. The types and frequency of occurrences of different information features are compared with the results from previous studies of m...

متن کامل

An Implementation of Symbolic Aboutness Theory

Today information can be globally shared via the Internet and can be accessible from anywhere in the world. The increasing complexity and size of the WWW urges the need of more effective mode for information processing techniques such as information retrieval and filtering, information summarization, topic segmentation, data mining and information discovery, etc. All of them can be fundamentall...

متن کامل

Aboutness from a commonsense perspective

Information retrieval (IR) is driven by a process which decides whether a document is about a query. Recent attempts spawned from logic-based information retrieval theory have formalized properties characterizing “aboutness”, but no consensus has yet been reached. The proposed properties are largely determined by the underlying framework within which aboutness is defined. In addition, some prop...

متن کامل

A commonsense aboutness theory for information retrieval modeling

Information retrieval (IR) can be viewed as a process to determine the “aboutness”, or sometimes “relevance”, relationship between information carriers (e.g. document and query). Thus, the concept of aboutness lies at the heart of IR. A better understanding of aboutness would lead to more effective IR systems. In this paper, we give a review of the status of current research on aboutness. It is...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014